Amboseli Baboon Amboseli Baboon2

Optimal Group Size in a Highly Social Animal

Background

Group living proposes an interesting question for ecologists, and many theories have been proposed for how this social characteristic of many species may have evolved. There are a number of potential benefits of group living including decreased risk to predators both due to increased vigilance and the selfish herd effect where individuals diffuse risk by situating themselves at the center of a group(Hamilton, 1971). Another example is found in the act of cooperative infant care, whereby different members in the group can aid in taking care of juveniles and thus reduce the costs typically given solely to the mother. However, there are also a myriad of negative consequences of group living. While larger groups may suffer less from between-group competition for food resources, they instead experience higher levels of within-group contest and scramble competition (Majolo et al., 2008). Further, larger groups may also expose themselves to increased disease transmission risk, although recently this has been debated (Ezenwa et al., 2016). Markham et al. (2015) set out to test the impact of group size on ecologically important variables ranging from monthly and annual home range size, daily travel distance, and fecal glucocorticoid concentrations (fGC). The results of this study have crucial implications for the fitness benefits and costs of group living in primates and animals more broadly.

Methods

This study was conducted over an 11 year period on wild baboons in the Amboseli-Longido ecosystem along the Kenya–Tanzania border in Africa as a part of the Amboseli Baboon Research Project. This ecosystem experiences considerable seasonality with a 5 month dry season with very little or no precipitation and a 7 month wet season with variable precipitation. Environmental variables including maximum daily temperature and total cumulative rainfall were taken each day to be used in the analysis. In addition, rainfall evenness was determined as the spread of rainfall across the months of a given year. Home range size and daily distance traveled were calculated through the usage of both handheld global postioning system (GPS) units carried during focal follows and GPS collars on individuals within groups. Annual home range sizes and monthly home range sizes were calculated via minimum convex polygons (MCP) which define the home range of an animal via the connection of the outer-most points. Daily distance traveled was calculated through the straight line distance between each point from the GPS collar locations. Space use evenness was calculated by the proportion of time spent in 250 x 250 m grid cells. Finaly, fecal glucocorticoid concentrations (fGC) were estimated via lab assay from collected freshly deposited feces.

Outline of Replication Project

Markham et al. (2015) used the software Statistical Package for the Social Sciences (SPSS) for all of their analyses. As such, this replication project will essentially be a full reassessment of their analyses using R. Because of this, not all of my results match perfectly, but they all still confirm their findings. This replication will procede as follows:

  1. Load data from each of the following data sets: fGC and daily travel distance, annual home range and foraging predictors, monthly ranging pattern predictors, and monthly fGC predictors.

  2. Plot the simple bivariate quadratic relationships between annual and monthly home range size and group size, daily travel distance and group size, evenness of space use and group size, and fecal glucocorticoid contentration (fGC) and group size. This will be performed by utilizing a linear model to obtain the coefficients as well as the usage of stat_smooth via {ggplot2} to plot a quadratic relationship.

  3. Run and visualize the two linear models of fGC and daily travel distance and proportion of time spent foraging and groupsize using the function lm. This will be complimented by the usage of geom_text to insert line equations and fullrange=TRUE to extrapolate data via {ggplot2}.

  4. Produce generalized estimating equation (GEE) models using the function geeglm from the package {geepack}on the four groups of data to obtain coefficient, wald test, and p-value estimates reflecting the effect of group size. Markham et al. (2015) note that GEEs are used in this case as they account for repeated measurements (in this case on the same baboon groups) and can work with temporal autocorrelation. In addition, I will create a summary table utilizing the R package {kableExtra}.

  5. Discuss the results of their study as well as my replication.

Step 1: Loading Data

First, I load in the four data sets that I have saved to my repository as csv files. These were converted to comma separated value (csv) from their original excel version as R can only import data via csv.

library(curl)
## Using libcurl 7.64.1 with Schannel
f <- curl("https://raw.githubusercontent.com/fshortIV/fshortIV-data-replication-assignment/main/fGC%20and%20daily%20travel%20distance%20data.csv")
fgcdata <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = FALSE)
names(fgcdata)<-c('ID','FGC',"DTD","LogFGC")
head(fgcdata)
##   ID   FGC DTD   LogFGC
## 1  1  66.6 3.9 1.823474
## 2  1  74.0 4.7 1.869232
## 3  1  84.5 5.2 1.926857
## 4  1  97.0 5.3 1.986772
## 5  1 126.8 4.9 2.103119
## 6  1 106.6 5.0 2.027757
d <- curl("https://raw.githubusercontent.com/fshortIV/fshortIV-data-replication-assignment/main/Annual%20home%20range%20and%20foraging%20predictors.csv")
annualdata <- read.csv(d, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(annualdata)
##   X.U.FEFF.Group.ID Hydrological.year Average.group.size
## 1                 1              2000               40.4
## 2                 1              2001               35.5
## 3                 1              2002               36.3
## 4                 1              2003               40.7
## 5                 1              2004               41.2
## 6                 1              2005               43.8
##   Average.group.size.squared Cumulative.rainfall..mm. Rainfall.evenness
## 1                    1632.16                    188.4              0.38
## 2                    1260.25                    517.1              0.60
## 3                    1317.69                    270.4              0.74
## 4                    1656.49                    331.5              0.60
## 5                    1697.44                    221.2              0.73
## 6                    1918.44                    345.5              0.65
##   Average.maximum.temperature..degrees.C. Proportion.time.foraging
## 1                                    34.2                     0.62
## 2                                    33.5                     0.52
## 3                                    32.0                     0.61
## 4                                    32.4                     0.69
## 5                                    32.5                     0.64
## 6                                    33.1                     0.62
##   Annual.home.range.size..sq.km. Log.Annual.Home.Range.Size
## 1                           23.1                   1.363612
## 2                           13.6                   1.133539
## 3                           12.0                   1.079181
## 4                           13.8                   1.139879
## 5                            5.6                   0.748188
## 6                           10.5                   1.021189
g <- curl("https://raw.githubusercontent.com/fshortIV/fshortIV-data-replication-assignment/main/Monthly%20ranging%20pattern%20predictors.csv")
monthlyhrdata <- read.csv(g, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(monthlyhrdata)
##   X.U.FEFF.Group.ID Temporal.sequence Group.size Cumulative.rainfall..mm.
## 1                 1                 1         52                      4.4
## 2                 1                 2         51                      0.0
## 3                 1                 3         51                      0.0
## 4                 1                 4         51                      0.0
## 5                 1                 5         51                      0.0
## 6                 1                 6         52                      0.0
##   Average.maximum.temperature..degrees.C. Evenness.of.space.use
## 1                                    33.2                  3.77
## 2                                    33.3                  4.18
## 3                                    30.1                  4.44
## 4                                    30.1                  4.20
## 5                                    31.1                  4.20
## 6                                    33.8                  3.92
##   Monthly.home.range.size..sq.km. Average.daily.travel..km.
## 1                             4.4                       3.9
## 2                             7.6                       4.7
## 3                            10.5                       5.2
## 4                             8.9                       5.3
## 5                             8.8                       4.9
## 6                             9.0                       5.0
h <- curl("https://raw.githubusercontent.com/fshortIV/fshortIV-data-replication-assignment/main/Monthly%20fGC%20predictors.csv")
monthlyfgcdata <- read.csv(h, header = TRUE, sep = ",", stringsAsFactors = FALSE)
head(monthlyfgcdata)
##   X.U.FEFF.Group.ID Temporal.sequence Group.size Cumulative.rainfall..mm.
## 1                 1                 1         42                      5.2
## 2                 1                 2         44                      0.0
## 3                 1                 3         44                      4.4
## 4                 1                 4         43                      1.0
## 5                 1                 5         43                      0.0
## 6                 1                 6         37                      0.6
##   Average.maximum.temperature..degrees.C. fGC..g.ng.dry.feces.
## 1                                    34.5                 55.4
## 2                                    37.1                 76.9
## 3                                    37.9                 67.3
## 4                                    36.6                 65.9
## 5                                    34.4                 91.4
## 6                                    31.9                 84.1

Step 2: Running and Plotting Bivariate Quadratic Relationships

Now, I will be plotting the bivariate relationships using the lm function as well as the quadratic method supplied by stat_smooth in {ggplot2}. This was my first hurdle in this replication. The authors do not specify how these plots were generated or in what way they were modeled, as they only discuss the GEE portion of their analysis. In addition, many of the steps they take for their GEE analysis including logging and transforming are confusingly not done here. However, it is clear that they used a similar quadratic method as my coefficients are nearly exactly the same. I will first depict how I created and plotted the first model as the rest are done in very much the same way.

library(ggplot2)
library(gridExtra)
AnnualPlotlm<- lm(Annual.home.range.size..sq.km. ~ Average.group.size + I(Average.group.size^2), data = annualdata)
summary(AnnualPlotlm)
## 
## Call:
## lm(formula = Annual.home.range.size..sq.km. ~ Average.group.size + 
##     I(Average.group.size^2), data = annualdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.4178 -2.6140 -0.5083  2.1303  9.2804 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             22.5388006  3.0768988   7.325  1.5e-09 ***
## Average.group.size      -0.3455007  0.1135571  -3.043 0.003673 ** 
## I(Average.group.size^2)  0.0033660  0.0009517   3.537 0.000861 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.783 on 52 degrees of freedom
## Multiple R-squared:  0.2429, Adjusted R-squared:  0.2138 
## F-statistic: 8.342 on 2 and 52 DF,  p-value: 0.0007209
AnnualPlot <- ggplot(data = annualdata, aes(x = Average.group.size, y = Annual.home.range.size..sq.km.))
AnnualPlot <- AnnualPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE) + xlab("Group Size") + ylab("Home Range Size") + theme_classic() + xlim(10, 110) + annotate("text",x=60, y=30, label="Annual: y = 0.003x^2 - 0.346x + 22.539", size=3.5) 
AnnualPlot

Here, I first construct a linear model using lm where I obtain the coefficients to be later added over the plot. To create a quadratic lm, I utilize the formula “Average.group.size + I(Average.group.size^2)” for the predictor variable of group size.

When creating the plot, I utilized the function stat_smooth with the specification of a quadratic model. In addition, fullrange=TRUE was selected as an option to expand the plotted model past the automatically set x limits so that I can fit the plot to the same specifications as done in the study. The formula is overlaid via the function “annotate” which is similar to geom_text but does not cause issues such as text blurring.

MonthlyPlotlm<- lm(Monthly.home.range.size..sq.km. ~ Group.size + I(Group.size^2), data = monthlyhrdata)
summary(MonthlyPlotlm)
## 
## Call:
## lm(formula = Monthly.home.range.size..sq.km. ~ Group.size + I(Group.size^2), 
##     data = monthlyhrdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4488 -2.5341 -0.7023  0.9732 13.0310 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     20.019301   2.093012   9.565 2.31e-16 ***
## Group.size      -0.416736   0.072892  -5.717 8.43e-08 ***
## I(Group.size^2)  0.003656   0.000570   6.414 3.12e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.969 on 117 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3019, Adjusted R-squared:  0.2899 
## F-statistic: 25.29 on 2 and 117 DF,  p-value: 7.425e-10
MonthlyPlot <- ggplot(data = monthlyhrdata, aes(x = Group.size, y = Monthly.home.range.size..sq.km.))
MonthlyPlot <- MonthlyPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE) + xlab("Group Size") + ylab("Monthly Home Range Size") + theme_classic() + xlim(10, 110) + annotate("text",x=60, y=22.5, label="Monthly: y = 0.004x^2 - 0.417x + 20.019", size=3.5)
MonthlyPlot
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

BothPlot <- AnnualPlot + stat_smooth(aes(x = Group.size, y = Monthly.home.range.size..sq.km.), data = monthlyhrdata, 
              method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE, color = "red")
BothPlot <- BothPlot +  annotate("text",x=60, y=28, label="Monthly: y = 0.004x^2 - 0.417x + 20.019", size=3.5) + annotate("text",x=60, y=17, label="Annual", size=4) + annotate("text", x=80, y=7, label="Monthly", size=4)
BothPlot
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

DailyTravelPlotlm<- lm(Average.daily.travel..km. ~ Group.size + I(Group.size^2), data = monthlyhrdata)
summary(DailyTravelPlotlm)
## 
## Call:
## lm(formula = Average.daily.travel..km. ~ Group.size + I(Group.size^2), 
##     data = monthlyhrdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5079 -0.6564 -0.2198  0.4614  3.2672 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      6.2488043  0.4936011  12.660  < 2e-16 ***
## Group.size      -0.0557020  0.0171903  -3.240 0.001555 ** 
## I(Group.size^2)  0.0004749  0.0001344   3.533 0.000589 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.936 on 117 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1065, Adjusted R-squared:  0.09121 
## F-statistic: 6.972 on 2 and 117 DF,  p-value: 0.001378
DailyTravelPlot <- ggplot(data = monthlyhrdata, aes(x = Group.size, y = Average.daily.travel..km.))
DailyTravelPlot <- DailyTravelPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2),fullrange=TRUE) + xlab("Group Size") + ylab("Daily Travel Distance (km)") + theme_classic() + annotate("text",x=60.5, y=6.4, label="Monthly: y = 0.000475x^2 - 0.056x + 6.249", size=3.5) + annotate("text",x=60, y=5.2, label="Monthly", size=4) + xlim(10,110)
DailyTravelPlot
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

EvennessPlotlm<- lm(Evenness.of.space.use ~ Group.size + I(Group.size^2), data = monthlyhrdata)
summary(EvennessPlotlm)
## 
## Call:
## lm(formula = Evenness.of.space.use ~ Group.size + I(Group.size^2), 
##     data = monthlyhrdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60567 -0.14798 -0.00755  0.14147  0.63359 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      4.709e+00  1.302e-01  36.169  < 2e-16 ***
## Group.size      -2.085e-02  4.534e-03  -4.600 1.08e-05 ***
## I(Group.size^2)  1.811e-04  3.545e-05   5.107 1.28e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2469 on 117 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.209,  Adjusted R-squared:  0.1955 
## F-statistic: 15.46 on 2 and 117 DF,  p-value: 1.103e-06
EvennessPlot <- ggplot(data = monthlyhrdata, aes(x = Group.size, y = Evenness.of.space.use))
EvennessPlot <- EvennessPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE) + xlab("Group Size") + ylab("Evenness of space use") + theme_classic() + annotate("text",x=60, y=4.9, label="Monthly: y = 0.0002x^2 - 0.020x + 4.710", size=3.5) + annotate("text", x=60, y=4.35, label="Monthly", size=4) + xlim(10,110)
EvennessPlot
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

FGCPlotlm<- lm(fGC..g.ng.dry.feces. ~ Group.size + I(Group.size^2), data = monthlyfgcdata)
summary(FGCPlotlm)
## 
## Call:
## lm(formula = fGC..g.ng.dry.feces. ~ Group.size + I(Group.size^2), 
##     data = monthlyfgcdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -62.130 -12.515  -2.124   8.259 186.682 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     97.314322   5.323753  18.279  < 2e-16 ***
## Group.size      -0.785226   0.194889  -4.029 6.28e-05 ***
## I(Group.size^2)  0.006270   0.001622   3.866 0.000122 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.41 on 629 degrees of freedom
## Multiple R-squared:  0.02537,    Adjusted R-squared:  0.02227 
## F-statistic: 8.185 on 2 and 629 DF,  p-value: 0.0003095
FGCPlot <- ggplot(data = monthlyfgcdata, aes(x = Group.size, y = fGC..g.ng.dry.feces.))
FGCPlot <- FGCPlot + stat_smooth(method = "lm", formula = y ~ x + I(x^2), fullrange=TRUE) + xlab("Group Size") + ylab("Fecal glucocorticoid (ng/g dry feces)") + theme_classic() + annotate("text",x=60, y=97, label="Monthly: y = 0.006x^2 - 0.785x + 97.314", size=3.5) + xlim(10,110)+annotate("text", x=60, y=81.5, label="Monthly", size=4)
FGCPlot
## Warning: Removed 2 rows containing non-finite values (stat_smooth).

grid.arrange(BothPlot, DailyTravelPlot, EvennessPlot, FGCPlot, nrow = 2)
## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing non-finite values (stat_smooth).

## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing non-finite values (stat_smooth).

Figure 1

As you can see, the results from my models accurately depict those done by the authors. I overlaid the monthly home range plot over the annual homerange plot by simply adding to the previously created object using stat_smooth. I also utilized the function grid.arrange from the package{gridExtra}to combine each of the four plots.

Step 3: Running and Plotting Linear Models

Next, I will be created the linear models for the two relationships the authors assessed to be completely linear. This was the simplest and easiest part of the replication. I utilized the lm function again, but this time without any quadratic equation. I set the x and y limits of the graph to match that of those in the paper using xlim and ylim. Once again, the equation was overlaid via annotate.

library(ggplot2)

lm<- lm(LogFGC ~ DTD, data = fgcdata)
summary(lm)
## 
## Call:
## lm(formula = LogFGC ~ DTD, data = fgcdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.23746 -0.04868 -0.00916  0.05491  0.40048 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.804184   0.043150  41.812   <2e-16 ***
## DTD         0.020833   0.008535   2.441    0.016 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09204 on 129 degrees of freedom
## Multiple R-squared:  0.04415,    Adjusted R-squared:  0.03674 
## F-statistic: 5.958 on 1 and 129 DF,  p-value: 0.01601
lm <- ggplot(data = fgcdata, aes(x = DTD, y = LogFGC))
lm<- lm + geom_point() 
lm<- lm + geom_smooth(method = "lm", formula = y ~ x,fullrange=TRUE) + xlab("Daily Travel Distance (km)") + ylab("Log Fecal Glucocorticoid (ng/g dry feces)") + theme_classic() + xlim(3, 9) + ylim(1.6, 2.4) + annotate("text",x=6, y=2.375, label="y = 0.02x + 1.80", size=3.5)
lm

Figure2

Once again, my model is nearly identical to that of the original linear model.

lm2<- lm(Proportion.time.foraging ~ Average.group.size, data = annualdata)
summary(lm2)
## 
## Call:
## lm(formula = Proportion.time.foraging ~ Average.group.size, data = annualdata)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.146158 -0.034333  0.004629  0.041591  0.081504 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.6094885  0.0175998  34.630  < 2e-16 ***
## Average.group.size 0.0015963  0.0003044   5.244  2.8e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04942 on 53 degrees of freedom
## Multiple R-squared:  0.3416, Adjusted R-squared:  0.3292 
## F-statistic:  27.5 on 1 and 53 DF,  p-value: 2.798e-06
lm <- ggplot(data = annualdata, aes(x = Average.group.size, y = Proportion.time.foraging))
lm <- lm + geom_point() 
lm <- lm + geom_smooth(method = "lm", formula = y ~ x,fullrange=TRUE) + xlab("Group Size") + ylab("Proportion time spent foraging") + theme_classic() + xlim(10, 110) + ylim(0.50, 0.85) + annotate("text",x=61, y=0.8375, label="y = 0.002x + 0.609", size=3.5)
lm

Figure3

The same is also the case here.

Step 4 Part 1: Producing Generalized Estimating Equation Models

Here, I will be performing the main portion of their analysis, which was assessing the relationship between each of the data sets (Annual Home Range Size, Monthly Home Range Size, Daily Travel Distance, and Fecal Glucocorticoid Concentrations) and group size. This was also a challenge, as their descriptions of how they ran their GEE models were somewhat vague. Because the GEE method using in SPSS is likely not the same used here via {geepack}, I was not able to obtain exactly the same coefficients. More importantly, however, my models acchieved significant p-values for the same values as those in the study. As before, I will depict how the first model was run as the rest were created in essentially the same way.

library(gee)
## Warning: package 'gee' was built under R version 4.1.2
library(geepack)
## Warning: package 'geepack' was built under R version 4.1.2
geeAnnual<-  geeglm(formula = log(Annual.home.range.size..sq.km.) ~ Average.group.size + I(Average.group.size^2) + Cumulative.rainfall..mm. + Rainfall.evenness + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = annualdata, corstr = "ar1")
summary(geeAnnual)
## 
## Call:
## geeglm(formula = log(Annual.home.range.size..sq.km.) ~ Average.group.size + 
##     I(Average.group.size^2) + Cumulative.rainfall..mm. + Rainfall.evenness + 
##     Average.maximum.temperature..degrees.C., data = annualdata, 
##     id = X.U.FEFF.Group.ID, corstr = "ar1")
## 
##  Coefficients:
##                                           Estimate    Std.err   Wald Pr(>|W|)
## (Intercept)                              6.869e+00  3.379e+00  4.132  0.04208
## Average.group.size                      -1.474e-02  4.978e-03  8.768  0.00306
## I(Average.group.size^2)                  1.628e-04  3.663e-05 19.753 8.81e-06
## Cumulative.rainfall..mm.                 8.379e-05  1.106e-04  0.574  0.44851
## Rainfall.evenness                       -1.463e+00  5.278e-01  7.682  0.00558
## Average.maximum.temperature..degrees.C. -9.229e-02  9.057e-02  1.038  0.30820
##                                            
## (Intercept)                             *  
## Average.group.size                      ** 
## I(Average.group.size^2)                 ***
## Cumulative.rainfall..mm.                   
## Rainfall.evenness                       ** 
## Average.maximum.temperature..degrees.C.    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = ar1 
## Estimated Scale Parameters:
## 
##             Estimate  Std.err
## (Intercept)  0.05268 0.009466
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate Std.err
## alpha   0.6203  0.0931
## Number of clusters:   5  Maximum cluster size: 11

As you can see, I encorporated each of the variables utilized by the authors using a quadratic GEE via geeglm. The annual home range size was logged, as discussed by the authors. Id was controlled for by the GEE as a repeated measure using “id = X.U.FEFF.Group.ID”. Temporal autocorrelation control was implemented via “corstr =”ar1"". The coefficients are not the same as those in the study (which you will see below) but the model has achieved significance for the same variables as that of the authors’.

geeMonthlyHr<-  geeglm(formula = log(Monthly.home.range.size..sq.km.) ~ Group.size + I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = monthlyhrdata,
corstr = "ar1")
summary(geeMonthlyHr)
## 
## Call:
## geeglm(formula = log(Monthly.home.range.size..sq.km.) ~ Group.size + 
##     I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., 
##     data = monthlyhrdata, id = X.U.FEFF.Group.ID, corstr = "ar1")
## 
##  Coefficients:
##                                          Estimate   Std.err  Wald Pr(>|W|)    
## (Intercept)                              2.72e+00  6.50e-01 17.49  2.9e-05 ***
## Group.size                              -3.66e-02  8.24e-03 19.75  8.8e-06 ***
## I(Group.size^2)                          3.17e-04  7.02e-05 20.36  6.4e-06 ***
## Cumulative.rainfall..mm.                -1.82e-04  5.98e-04  0.09     0.76    
## Average.maximum.temperature..degrees.C.  1.22e-02  1.71e-02  0.51     0.48    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = ar1 
## Estimated Scale Parameters:
## 
##             Estimate Std.err
## (Intercept)   0.0842  0.0154
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate Std.err
## alpha    0.556  0.0602
## Number of clusters:   5  Maximum cluster size: 24
geeMonthlyDT<-  geeglm(formula = log(Average.daily.travel..km.) ~ Group.size + I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = monthlyhrdata,
corstr = "ar1")
summary(geeMonthlyDT)
## 
## Call:
## geeglm(formula = log(Average.daily.travel..km.) ~ Group.size + 
##     I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., 
##     data = monthlyhrdata, id = X.U.FEFF.Group.ID, corstr = "ar1")
## 
##  Coefficients:
##                                          Estimate   Std.err  Wald Pr(>|W|)    
## (Intercept)                              1.94e+00  2.73e-01 50.49  1.2e-12 ***
## Group.size                              -8.93e-03  5.00e-03  3.19    0.074 .  
## I(Group.size^2)                          8.06e-05  4.07e-05  3.92    0.048 *  
## Cumulative.rainfall..mm.                -3.20e-04  1.69e-04  3.61    0.057 .  
## Average.maximum.temperature..degrees.C. -5.55e-03  6.01e-03  0.85    0.355    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = ar1 
## Estimated Scale Parameters:
## 
##             Estimate Std.err
## (Intercept)   0.0293 0.00481
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate Std.err
## alpha    0.682  0.0732
## Number of clusters:   5  Maximum cluster size: 24
geeMonthlyEvenness<-  geeglm(formula = Evenness.of.space.use ~ Group.size + I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = monthlyhrdata,
corstr = "ar1")
summary(geeMonthlyEvenness)
## 
## Call:
## geeglm(formula = Evenness.of.space.use ~ Group.size + I(Group.size^2) + 
##     Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., 
##     data = monthlyhrdata, id = X.U.FEFF.Group.ID, corstr = "ar1")
## 
##  Coefficients:
##                                          Estimate   Std.err  Wald Pr(>|W|)    
## (Intercept)                              4.83e+00  6.06e-01 63.44  1.7e-15 ***
## Group.size                              -2.20e-02  6.93e-03 10.08  0.00150 ** 
## I(Group.size^2)                          1.93e-04  5.82e-05 10.96  0.00093 ***
## Cumulative.rainfall..mm.                 1.51e-04  4.15e-04  0.13  0.71581    
## Average.maximum.temperature..degrees.C. -3.98e-03  1.52e-02  0.07  0.79320    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = ar1 
## Estimated Scale Parameters:
## 
##             Estimate Std.err
## (Intercept)   0.0607 0.00908
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate Std.err
## alpha    0.554  0.0419
## Number of clusters:   5  Maximum cluster size: 24
geeMonthlyFGC<-  geeglm(formula = log(fGC..g.ng.dry.feces.) ~ Group.size + I(Group.size^2) + Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., id = X.U.FEFF.Group.ID, data = monthlyfgcdata,
corstr = "ar1")
summary(geeMonthlyFGC)
## 
## Call:
## geeglm(formula = log(fGC..g.ng.dry.feces.) ~ Group.size + I(Group.size^2) + 
##     Cumulative.rainfall..mm. + Average.maximum.temperature..degrees.C., 
##     data = monthlyfgcdata, id = X.U.FEFF.Group.ID, corstr = "ar1")
## 
##  Coefficients:
##                                          Estimate   Std.err   Wald Pr(>|W|)    
## (Intercept)                              4.58e+00  1.61e-01 812.08   <2e-16 ***
## Group.size                              -8.32e-03  3.09e-03   7.25   0.0071 ** 
## I(Group.size^2)                          6.83e-05  1.99e-05  11.79   0.0006 ***
## Cumulative.rainfall..mm.                 2.32e-04  1.84e-04   1.59   0.2069    
## Average.maximum.temperature..degrees.C. -2.29e-03  3.91e-03   0.34   0.5573    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = ar1 
## Estimated Scale Parameters:
## 
##             Estimate Std.err
## (Intercept)   0.0698 0.00419
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate Std.err
## alpha    0.488  0.0621
## Number of clusters:   5  Maximum cluster size: 128

Step 4 Part 2: Creating a Data Table Using kableExtra

Finally, I will be creating a summary table much like that in the study which will include all of the relevant outputs of my GEEs such as the coefficients, wald values, and p-values. This was also a challenge, and I believe there is likely a much more efficient and less involved way to do this. For each data set, I extracted the coefficient estimates, wald estimates, and p-values using the outputs of the model and combined them into a character or numeric string. I had issues with coercing the values to contain up to 3 decimals, and in some cases I had to manually input values into the string. Finally, the string was combined into a data frame incorporating all of the descriptive components of the analysis. Once in a clean and accurate dataframe, this was then passed on to {kableExtra} which was used to stylize the table. row_spec was used to bold significant results and italicize each of the components of the analysis.

Estimate = c(geeAnnual$coefficients[1],geeAnnual$coefficients[2],geeAnnual$coefficients[3],geeAnnual$coefficients[4],geeAnnual$coefficients[5],geeAnnual$coefficients[6])
Estimate = as.numeric(Estimate)
Estimate <- format(Estimate, digits=3)
Estimate[1] <- "6.87"


EstimateHr = c(geeMonthlyHr$coefficients[1],geeMonthlyHr$coefficients[2],geeMonthlyHr$coefficients[3],geeMonthlyHr$coefficients[4],geeMonthlyHr$coefficients[5])
EstimateHr = formatC(EstimateHr, format = "e", digits = 3)
EstimateHr[1] <-"2.720"

EstimateDT = c(geeMonthlyDT$coefficients[1],geeMonthlyDT$coefficients[2],geeMonthlyDT$coefficients[3],geeMonthlyDT$coefficients[4],geeMonthlyDT$coefficients[5])
EstimateDT = formatC(EstimateDT, format = "e", digits = 3)
EstimateDT[1] <-"1.942"

EstimateEven = c(geeMonthlyEvenness$coefficients[1],geeMonthlyEvenness$coefficients[2],geeMonthlyEvenness$coefficients[3],geeMonthlyEvenness$coefficients[4],geeMonthlyEvenness$coefficients[5])
EstimateEven = formatC(EstimateEven, format = "e", digits = 3)
EstimateEven[1] <-"4.830"


EstimateFGC = c(geeMonthlyFGC$coefficients[1],geeMonthlyFGC$coefficients[2],geeMonthlyFGC$coefficients[3],geeMonthlyFGC$coefficients[4],geeMonthlyFGC$coefficients[5])
EstimateFGC = formatC(EstimateFGC, format = "e", digits = 3)
EstimateFGC[1] <-"4.584"

Wald = c(summary(geeAnnual)$coefficients[1,3],summary(geeAnnual)$coefficients[2,3],summary(geeAnnual)$coefficients[3,3],summary(geeAnnual)$coefficients[4,3],summary(geeAnnual)$coefficients[5,3],summary(geeAnnual)$coefficients[6,3])
Wald = as.numeric(Wald)
Wald <- format(Wald, digits=3)

WaldHR = c(summary(geeMonthlyHr)$coefficients[1,3],summary(geeMonthlyHr)$coefficients[2,3],summary(geeMonthlyHr)$coefficients[3,3],summary(geeMonthlyHr)$coefficients[4,3],summary(geeMonthlyHr)$coefficients[5,3])
WaldHR = as.numeric(WaldHR)
WaldHR <- format(WaldHR, digits=3)

WaldDT = c(summary(geeMonthlyDT)$coefficients[1,3],summary(geeMonthlyDT)$coefficients[2,3],summary(geeMonthlyDT)$coefficients[3,3],summary(geeMonthlyDT)$coefficients[4,3],summary(geeMonthlyDT)$coefficients[5,3])
WaldDT = as.numeric(WaldDT)
WaldDT <- format(WaldDT, digits=3)

WaldEven = c(summary(geeMonthlyEvenness)$coefficients[1,3],summary(geeMonthlyEvenness)$coefficients[2,3],summary(geeMonthlyEvenness)$coefficients[3,3],summary(geeMonthlyEvenness)$coefficients[4,3],summary(geeMonthlyEvenness)$coefficients[5,3])
WaldEven = as.numeric(WaldEven)
WaldEven <- format(WaldEven, digits=3)

WaldFGC = c(summary(geeMonthlyFGC)$coefficients[1,3],summary(geeMonthlyFGC)$coefficients[2,3],summary(geeMonthlyFGC)$coefficients[3,3],summary(geeMonthlyFGC)$coefficients[4,3],summary(geeMonthlyFGC)$coefficients[5,3])
WaldFGC = as.numeric(WaldFGC)
WaldFGC <- format(WaldFGC, digits=3)


P = c(summary(geeAnnual)$coefficients[1,4],summary(geeAnnual)$coefficients[2,4],summary(geeAnnual)$coefficients[3,4],summary(geeAnnual)$coefficients[4,4],summary(geeAnnual)$coefficients[5,4],summary(geeAnnual)$coefficients[6,4])
P = as.numeric(P)
P <- format(P, digits=3)
P <- c("0.053","0.049","0.011","0.740","0.019","0.336")

PHR = c(summary(geeMonthlyHr)$coefficients[1,4],summary(geeMonthlyHr)$coefficients[2,4],summary(geeMonthlyHr)$coefficients[3,4],summary(geeMonthlyHr)$coefficients[4,4],summary(geeMonthlyHr)$coefficients[5,4])
PHR = c(formatC(PHR[1:3], format = "g", digits = 3), formatC(PHR[4:5], format = "f", digits = 3))
PHR[1:3] = "<0.001"

PDT = c(summary(geeMonthlyDT)$coefficients[1,4],summary(geeMonthlyDT)$coefficients[2,4],summary(geeMonthlyDT)$coefficients[3,4],summary(geeMonthlyDT)$coefficients[4,4],summary(geeMonthlyDT)$coefficients[5,4])
PDT = c(formatC(PDT[1], format = "g", digits=3),formatC(PDT[2:5], format = "f", digits=3))
PDT[1] = "<0.001"

PEven = c(summary(geeMonthlyEvenness)$coefficients[1,4],summary(geeMonthlyEvenness)$coefficients[2,4],summary(geeMonthlyEvenness)$coefficients[3,4],summary(geeMonthlyEvenness)$coefficients[4,4],summary(geeMonthlyEvenness)$coefficients[5,4])
PEven = c(formatC(PEven[1:3], format = "g", digits=3),formatC(PEven[4:5], format = "f", digits=3))
PEven[1:3] = "<0.001"

PGC = c(summary(geeMonthlyFGC)$coefficients[1,4],summary(geeMonthlyFGC)$coefficients[2,4],summary(geeMonthlyFGC)$coefficients[3,4],summary(geeMonthlyFGC)$coefficients[4,4],summary(geeMonthlyFGC)$coefficients[5,4])
PGC[1:3] = "<0.001"
PGC[4] = "0.207"
PGC[5] = "0.557"


library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.1.2
GEETable <- data.frame( 
  Dependent_Variable = c("Home range (annual)","Intercept","Group size","Group size^2","Cumulative rainfall","Rainfall evenness","Average maximum temperature","Home range (monthly)","Intercept","Group size","Group size^2","Cumulative rainfall","Average maximum temperature","Average daily distance traveled (monthly)","Intercept","Group size","Group size^2","Cumulative rainfall","Average maximum temperature","Evenness of space use (monthly)","Intercept","Group size","Group size^2","Cumulative rainfall","Average maximum temperature","Fecal glucocorticoids (monthly)","Intercept","Group size","Group size^2","Cumulative rainfall","Average maximum temperature") ,
  Estimate = c("",Estimate,"",EstimateHr,"",EstimateDT,"",EstimateEven,"",EstimateFGC),
  Wald = c("",Wald,"",WaldHR,"",WaldDT,"",WaldEven,"",WaldFGC),
  P = c("",P,"",PHR,"",PDT,"",PEven,"",PGC)
)

GeeKable<- GEETable %>%
  kbl(caption = "Results from GEEs") %>%
  kable_classic(full_width = F, html_font = "Cambria")
GeeKable<- row_spec(GeeKable, row=c(3,4,6,10,11,17,22,23,28,29), bold = TRUE)
GeeKable<- row_spec(GeeKable, row=c(1,8,14,20,26), font_size=18, italic=TRUE)
GeeKable
Results from GEEs
Dependent_Variable Estimate Wald P
Home range (annual)
Intercept 6.87 4.132 0.053
Group size -1.47e-02 8.768 0.049
Group size^2 1.63e-04 19.753 0.011
Cumulative rainfall 8.38e-05 0.574 0.740
Rainfall evenness -1.46e+00 7.682 0.019
Average maximum temperature -9.23e-02 1.038 0.336
Home range (monthly)
Intercept 2.720 17.4941 <0.001
Group size -3.659e-02 19.7457 <0.001
Group size^2 3.168e-04 20.3584 <0.001
Cumulative rainfall -1.824e-04 0.0929 0.760
Average maximum temperature 1.222e-02 0.5100 0.475
Average daily distance traveled (monthly)
Intercept 1.942 50.493 <0.001
Group size -8.927e-03 3.192 0.074
Group size^2 8.055e-05 3.918 0.048
Cumulative rainfall -3.205e-04 3.613 0.057
Average maximum temperature -5.554e-03 0.854 0.355
Evenness of space use (monthly)
Intercept 4.830 63.4442 <0.001
Group size -2.199e-02 10.0817 <0.001
Group size^2 1.927e-04 10.9598 <0.001
Cumulative rainfall 1.510e-04 0.1325 0.716
Average maximum temperature -3.981e-03 0.0687 0.793
Fecal glucocorticoids (monthly)
Intercept 4.584 812.083 <0.001
Group size -8.317e-03 7.247 <0.001
Group size^2 6.835e-05 11.787 <0.001
Cumulative rainfall 2.323e-04 1.593 0.207
Average maximum temperature -2.293e-03 0.344 0.557

Table 1

Step 5: Discussion and Critque

Overall, my replication was successful in reproducing the results of the study by Markham et al. (2015). The authors found that, interestingly, the relationship between group size and variables such as home range size, daily travel distance, evenness of space use, and fecal glucocorticoid concentrations (fGC) was best modeled as a quadratic with a U-shaped relationship. Intermediate groups had the optimum group size and thus displayed the lowest home range sizes, daily travel distances, evenness of space use, and fecal glucocorticoid concentrations (fGC). In contrast, both smaller and larger groups had greater values for each of these variables. Energetically, this suggests that both smaller and larger groups suffer the cost of traveling farther while foraging. Regarding health, non-intermediate groups may experience higher glucocorticoid concentrations as a result of increased activity or higher stress. This fits with the literature suggesting greater within-group competition for larger groups, but why do smaller groups also need to travel more? The authors posit that small groups may be at a disadvantage regarding between-group competition with other baboons and as such may not gain access to preferred food resources. In addition, small groups may also be at a greater risk of predation which could also cause displacement from resource rich areas. Markham et al. (2015) discuss that while some primates in fission-fusion societies may experience fluctuating group sizes, baboons typically have stable group sizes with more or less the same individuals comprising a group for long periods of time. As a result, group size may have long lasting fitness consequences for baboons. Finally, they consider that this mechanism of optimal group size may explain the versatility and persistence of baboons in a variety of habitats.

Undoubtedly, this study produced valuable and interesting results that helped change our understanding of group size and its predictors. However, my replication has revealed that reproducibility in science is strongly hindered both by standardization of statistical methods and specificity of statistical components. Regarding the former, SPSS is an expensive program that is not open source which limits the ability of individuals such as myself to completely and accurately reproduce the data in this study. Further, I would argue that the consensus for most academic researchers is to use R for statistical methods. Having a consensus such as this ideally offers an optimal environment for reproducing science, as once someone has learned a given programming language such as R they can at least try their hand at replicating any study done via R. Regarding the latter, the exact specifications of the models were not completely clear. For example, the authors did not include how the bivariate relationships were assessed and plotted. More attention should be placed in explicitly laying out statistical methods, even if they seem self-explanatory. Mic drop

References

Ezenwa, V. O., Ghai, R. R., McKay, A. F., & Williams, A. E. (2016). Group living and pathogen infection revisited. Current Opinion in Behavioral Sciences, 12, 66-72.

Hamilton, W. D. (1971). Geometry for the selfish herd. Journal of theoretical Biology, 31(2), 295-311.

Majolo, B., de Bortoli Vizioli, A., & Schino, G. (2008). Costs and benefits of group living in primates: group size effects on behaviour and demography. Animal Behaviour, 76(4), 1235-1247.